Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add @inbounds to iterate method for AbstractArray: 3x speedup #57112

Conversation

nsajko
Copy link
Contributor

@nsajko nsajko commented Jan 21, 2025

No description provided.

@nsajko
Copy link
Contributor Author

nsajko commented Jan 21, 2025

Timings:

struct Vec{T} <: AbstractVector{T}
    mem::Memory{T}
end
Base.IndexStyle(::Type{<:Vec}) = IndexLinear()
Base.@propagate_inbounds Base.getindex(A::Vec, i::Int) = A.mem[i]
Base.size(a::Vec) = size(a.mem)

function f(v)
    z = eltype(v)(false)
    for x  v
        z += x
    end
    z
end

using BenchmarkTools
using Random: rand!

const vec = let m = Memory{Int8}(undef, 4096)
    rand!(m)
    Vec(m)
end

@btime f(vec) setup=(rand!(vec.mem);)
# 91.502 ns (0 allocations: 0 bytes)

# redefine the methods: placebo
function Base.iterate(A::AbstractArray, state=(eachindex(A),))
    y = iterate(state...)
    y === nothing && return nothing
    A[y[1]], (state[1], Base.tail(y)...)
end

@btime f(vec) setup=(rand!(vec.mem);)
# 91.764 ns (0 allocations: 0 bytes)

# redefine the methods: add `@inbounds`
function Base.iterate(A::AbstractArray, state=(eachindex(A),))
    y = iterate(state...)
    y === nothing && return nothing
    @inbounds A[y[1]], (state[1], Base.tail(y)...)
end

@btime f(vec) setup=(rand!(vec.mem);)
# 31.134 ns (0 allocations: 0 bytes)

Info:

julia> versioninfo()
Julia Version 1.12.0-DEV.1906
Commit 17402876ec2 (2025-01-18 07:33 UTC)
Build Info:
  Official https://julialang.org release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 8 × AMD Ryzen 3 5300U with Radeon Graphics
  WORD_SIZE: 64
  LLVM: libLLVM-18.1.7 (ORCJIT, znver2)
  GC: Built with stock GC
Threads: 1 default, 0 interactive, 1 GC (on 8 virtual cores)

@nsajko nsajko added performance Must go faster arrays [a, r, r, a, y, s] iteration Involves iteration or the iteration protocol labels Jan 21, 2025
@jakobnissen
Copy link
Contributor

I believe this goes against an existing principle to not add inbounds to generic code - see e.g. the docs for inbounds.
The issue is that this change will turn an ordinary bug in eachindex into undefined behaviour. This breaks the principle that undefined behaviour should only occur due to a locally wrongly placed unsafe operation.

@nsajko
Copy link
Contributor Author

nsajko commented Jan 21, 2025

IMO a correct eachindex is a pretty low bar 🤷

And there's always --check-bounds=yes, turned on by default during testing.

@nsajko
Copy link
Contributor Author

nsajko commented Jan 21, 2025

There's another drawback to this PR, though: @inbounds thrashes the effects. This might matter for some array type that relies on constant folding a lot, I guess.

@jishnub
Copy link
Contributor

jishnub commented Jan 28, 2025

#40397 has prior discussion on this

@nsajko nsajko closed this Jan 28, 2025
@nsajko nsajko deleted the iterate_abstractarray_3x_speedup_by_adding_an_inbounds branch January 28, 2025 15:12
@nsajko nsajko added the duplicate Indicates similar issues or pull requests label Jan 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrays [a, r, r, a, y, s] duplicate Indicates similar issues or pull requests iteration Involves iteration or the iteration protocol performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants